Global Mosquito Alert Models

Model Overview

These models provide estimates of the probability of reports being sent through Mosquito Alert during a given month from each area shown on the map assuming full sampling effort everywhere. In the case of mosquito bite reports, the estimates are of the probability of at least one such report being sent during the given month. In the case of the targeted adult mosquito reports ( Ae. albopictus, Ae. koreicus, Ae. japonicus, or Culex), the estimates are of the probability of at least report being sent during the given month that is classified by the digital entolab experts as possibly or probably representing the given target species/genus (i.e. a score of 1 or 2).

These estimates should be roughly correlated with the probability of a person being bitten by a mosquito or of encountering an adult mosquito of each given target species/genus. Note however, that the probabilities faced by a person are not what is being directly modeled. Instead, the models start with a set of spatio-temporal units (see more on this below) and they make estimates about the probabilities of reports being received from these units.

Units of analysis

The units of analysis in these models are areal-unit-months, with the areal units defined by level 4 of the Global Administrative Areas Database (GADM) (Global Administrative Areas 2022; see Brigham, Gilbert, and Xu 2011), with slight modifications. GADM level 4 corresponds to Eurostat’s Local Administrative Units Level 2, which are municipalities in most of the countries included in the model. For those countries for which GADM level 4 is not available (e.g. Andorra), the next lowest GADM level is used instead. (GADM is used here instead of Eurostat’s NUTS or LAU because these models will soon be expanded beyond Europe.)

Model estimates are shown on a map that can be zoomed out to reveal GADM levels 3, 2, and 1 as well. For these units, estimates are made by aggregating probabilities such that the interpretation remains the same: The estimate at each level is of the probability of at least one report being sent from the given unit.

Model Specification

The models are Bayesian multilevel logistic regressions estimated using Stan (Stan Development Team 2022) via the brms inferface for R (Bürkner 2021; R Core Team 2022). The log odds of at least one report (as explained above) is modeled as a function of a set of landcover variables, a set of weather variables, sampling effort, and area, with random intercepts at the GID 1 level.

For each of the three targeted Aedes species (Ae. albopictus, Ae. koreicus, and Ae. japonicus) the models are as follows:

\textrm{log} \frac{\pi_{ijt}}{1-\pi_{ijt}} = \textrm{log}(\textrm{SE}_{i}) + \textrm{log}(\textrm{area}_{i}) + \alpha_{1j} + \beta_1\textrm{TEMP}_{it} + \beta_{2}\textrm{TEMP}_{it}^2 + \beta_{3}\textrm{RH}_{it} + \beta_{4}\textrm{W}_{it} + \beta_{5}\textrm{DUF}_{i} + \beta_{6}\textrm{CUF}_{i} + \beta_{4}\textrm{FOR}_{i} + \beta_{4}\textrm{AGR}_{i}

For Culex, the model is:

\frac{\pi_{ijt}}{1-\pi_{ijt}} = log(\textrm{SE}_{i}) + log(\textrm{area}_{i}) + \alpha_{1j} + \beta_1\textrm{TEMP}_{it} + \beta_{2}\textrm{TEMP}_{it}^2 + \beta_{3}\textrm{RH}_{it} + \beta_{4}\textrm{DUF}_{i} + \beta_{5}\textrm{GUF}_{i} + \beta_{6}\textrm{AGR}_{i}

For bites, the model is:

\textrm{log} \frac{\pi_{ijt}}{1-\pi_{ijt}} = \textrm{log}(\textrm{SE}_{i}) + \textrm{log}(\textrm{area}_{i}) + \alpha_{1j} + \beta_1\textrm{TEMP}_{it} + \beta_{2}\textrm{TEMP}_{it}^2 + \beta_{3}\textrm{RH}_{it} + \beta_{4}\textrm{W}_{it} + \beta_{5}\textrm{DUF}_{i} + \beta_{6}\textrm{CUF}_{i} + \beta_{5}\textrm{GUF}_{i} + \beta_{4}\textrm{FOR}_{i} + \beta_{4}\textrm{AGR}_{i}

The covariates in these models are explained further below.

Covariate Definitions

  • SE: Sampling effort. This is estimated from Mosquito Alert’s optional background tracking module, which provides approx. 5 locations per day for each participant who has not opted out of it, at random times. All locations are masked to a grid of 0.025 degrees latitude and longitude before being transmitted from the participant’s device to the server. Participants’ propensity to send any report is estimated, based on how long each participant has had the app, as the discrete empirical hazard from the reporting data. SE depends on the number of participants in each sampling-cell-month and each of these participant’s reporting propensity, with propensities aggregated such that the SE value can be interpreted as the probabity of at least one report (of any time, valid or not valid) coming from the given spatio-temporal unit.
  • TEMP: Temperature at 2 meters above the surface in C. From ERA5-Land monthly averaged data from 1950 to present.
  • RH: Relative humidity. From ERA5-Land monthly averaged data from 1950 to present.
  • W: Windspeed in meters per second. From ERA5-Land monthly averaged data from 1950 to present.

Model Estimates

The following plots show model estimates at the GADM-4 level (the smallest spatial unit of analysis) for each month during 2022 for each of the target species/genus and for mosquito bites. The final plot then shows the temporal patterns of the estimates from Januray 2021 to present.

Ae. albopictus

albopictus estimates

Ae. koreicus

koreicus estimates

Ae. japonicus

japonicus estimates

Culex

Culex estimates

Bites

Bite estimates

References

Brigham, Charles, Steven Gilbert, and Qiyang Xu. 2011. “Open Geospatial Data: An Assessment of Global Boundary Datasets.” World Bank Institute.
Bürkner, Paul-Christian. 2021. “Bayesian Item Response Modeling in R with brms and Stan.” Journal of Statistical Software 100 (5): 1–54. https://doi.org/10.18637/jss.v100.i05.
Global Administrative Areas. 2022. GADM database of Global Administrative Areas, Version 4.1. [Online].” http://www.gadm.org.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Stan Development Team. 2022. Stan Modeling Language Users Guide and Reference Manual.” https://mc-stan.org/.